Score Aggregation from Multiple Sources and Training in the Context of Lexicon Reduction using Holistic Features

نویسندگان

  • Sriganesh Madhvanath
  • Venu Govindaraju
چکیده

Holistic methods developed for small, static lexicons are not easily extended to the large and dynamic lexicon scenario owing to word-level feature variability and paucity of training samples. A methodology of coarse holistic features and heuristic prediction of ideal features from ASCII is proposed to address these issues. The proposed methodology is based on the axiom that real-world examples of handwritten words may be viewed as the ideal exemplar of the word class distorted by the scriptor, stylus, medium and intervening electronic imaging processes. On a test set of 3,000 handwritten city names, we achieved a 70% reduction in the lexicon size (1,000) with 98.7% accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotion Detection in Persian Text; A Machine Learning Model

This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...

متن کامل

Code-Copying in the Balochi Language of Sistan

This empirical study deals with language contact phenomena in Sistan. Code-copying is viewed as a strategy of linguistic behavior when a dominated language acquires new elements in lexicon, phonology, morphology, syntax, pragmatic organization, etc., which can be interpreted as copies of a dominating language. In this framework Persian is regarded as the model code which provides elements for b...

متن کامل

یک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر

This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...

متن کامل

Different Aggregation Strategies for Generically Contextualized Sentiment Lexicons

Sentiment detection has gained relevance in the last years due to the vast amount of publicly available opinion in the form of Web forums or blogs. Yet, it still suffers from the ambiguity of language, lowering the efficacy and accuracy of sentiment detection systems. Thus, it is important to also invoke context information to refine the initial values of sentiment terms. Moreover, domain-indep...

متن کامل

Using Multiple-Variable Matching to Identify EFL Ecological Sources of Differential Item Functioning

Context is a vague notion with numerous building blocks making language test scores inferences quite convoluted. This study has made use of a model of item responding that has striven to theorize the contextual infrastructure of differential item functioning (DIF) research and help specify the sources of DIF. Two steps were taken in this research: first, to identify DIF by gender grouping via l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000